On 07/31/2012 04:16 PM, Chris Little wrote:
> My new usfm2osis.py script is progressing quite nicely. I've got it 
> generating valid OSIS from one Bible that uses a very minimal set of USFM 
> elements. At the moment, I'm working to make it process all tags present 
> within the USFM versions of the WEB and RV, and this has raised an issue.
>
> I've been working primarily with the USFM reference from UBS ICAP, treating 
> it as a sort of specification. My question is: should this new utility accept 
> USFM that does not conform to the reference at UBS ICAP?
>
> Should it accept & interpret USFM tags that are not present in the reference?
>
> One specific example is that the WEB uses \fqa*, which is obviously intended 
> as an end-tag version of \fqa (used to mark alternate translations). But the 
> USFM reference does not identify this as a valid end-tag, by my reading.
>
> So... should we...
>
> a) Make the new utility accept non-conformant USFM (from the perspective of 
> the USFM reference). I'm leery of this, since one of my reasons for writing 
> the new utility was to keep it pristinely spec-conformant and I have a 
> feeling we might start incorporating tags and syntax that are less obviously 
> interpretable than \fqa*.

Yes, in the case of \fqa* and other tags that were actually historically part 
of USFM. The alternative is excessive unnecessary handling of "exceptions". At 
one point, the USFM standard could have reasonably been interpreted such that 
all "character" styles had an explicit end marker, and by extension, the 
implicit ability within the markup to support character style nesting or 
stacking. However, because Paratext never supported that, simply starting 
another style, such as \ft in the case of \fqa, ended any
other active character style.

In a future iteration of USFM, character style stacking will be allowed with a 
different syntax, with "+" inserted between the "\" and the opening character 
style indicator. This will also likely require explicit end markers to come 
back, at least when the "+" syntax is used.

In addition to the USFM specification at 
http://paratext.ubs-translations.org/about/usfm, Paratext itself is a defacto 
part of the standard, and it has no problem with reading \fqa*. It just doesn't 
generate it.

I really didn't notice the fact that \fqa* quietly disappeared from the USFM 
standard until now. Bibledit still supports it, as do my USFM-handling routines.

> b) Write a separate utility to convert common and interpretable 
> non-conformant tags/syntax to conformant markup.

This could work. For example, a global search and replace of "\fq*" with "\ft " 
(including the space) would take care of that one marker, resulting in USFM 
markup that conforms better to USFM 2.35. More generally, "\f?*" could be 
replaced with "\ft ", where ? is a wild card and * is not. Likewise, "\x?*" 
could be replaced with "\xt ".

> c) Add a command-line switch to usfm2osis.py so that it performs a 
> pre-processing step of making non-conformant tags/syntax into conformant 
> markup. (This would be the same as option b, but would place everything in a 
> single utility.)

It is unlikely that character style end marker processing all the time would 
cause a problem, leaving no reason to turn the switch off. However, if you had 
processing for common mistakes, like writing \q where \pi belongs, that might 
better be put in an optional preprocessor.

> d) Punt on the issue, and let those performing conversion deal with 
> non-conformant markup on a case by case basis.

That is a reasonable alternative for cases where the intentions of the markup 
are unclear or where the nonconformities are not consistent. A great example of 
a place to punt is in the \z namespace. There is really no way to know if a 
custom marker has an end marker or not, or if text associated with it should be 
ignored or not, unless you get project-specific instructions with the marker. 
Likewise, some people make up their own custom markers that don't start with a 
z. There are also markups that predate USFM.

I hope this helps...

Michael
MLJohnson.org


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to