I'm breaking my long period of ignoring
and avoiding OSIS, and working on building a USFX to MOSIS
converter into the open source Haiola software, both into the UI
tool and as a stand-alone cross-platform executable. The "M" in
"MOSIS" is for "Modified". The only significant modification is a
shift in the semantics of <q who="Jesus" sID="somethingunique"
marker=""> to be used only in milestone form, only for
quotations by Jesus (as an equivalent of the <wj> tag, and,
rather than only at the beginning and end of the quotation, to
stop and start at verse boundaries. The proper quotation
punctuation for the translations are always in the text of the
translation, where almost all translators believe they belong. The
result is not exactly in line with the original intentions of
OSIS, but should validate against the Schema fine, and actually be
easier to display. This is a fairly harmless exception for Sword
use, since the result is processed to display on a verse-by-verse
basis, anyway.
It takes more than simple replacement of tags, i.e. with awk, to get the conversion right, if you really understand both the source and destination standards. I'm working in C#, because that is the tool I know best, although other languages could work, too. It is the actual logic implemented that matters. Although there is a fair amount of varying interpretation of what USFM markers should mean and some historical artifacts left over from when other SFM predecessors were in use with different meaning, not to mention intentional variation from the current USFM standard, I have 241 USFX Scripture texts in 237 dialects of 212 languages that are all "clean" enough with respect to markup that I can and did produce web sites from them. They should be clean enough to convert to sword modules in an automated fashion. 11 of those are Public Domain. The rest are available under the terms at http://PNGScriptures.org/terms.htm. The exceptions to markup cleanness that remain are generally problems with peripheral materials other than the actual Scriptures, which could be stripped out until such time as someone manually cleans them up. Some of the metadata expected by OSIS isn't present in raw USFM source, but I have that stored in other XML files in Haiola project configurations, so I'll pull that in for the merge. I have more texts that can be added to the set of 241 mentioned above, but I haven't cleaned them up and processed them, yet. So much work, so little time... time to pray and code! On 11/08/2012 12:39 AM, Chris Burrell wrote: Thanks for all the info. On the last point, I did mean read directly from USFM. I don't know the format well-enough, but presumably if other software uses it, then maybe we could have a go at displaying the best we can... --
Aloha, |
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page