Peter,

I did some work on usfm2osis.pl to work with several usfm source texts, but I 
don't know if my revision ever made its way on to the server (I think I sent 
version 1.4 to Chris). I can send it to you if you like. It fixes a few 
problems and extends the support of usfm tags slightly. Perhaps the biggest 
contribution I made was to list/identify all of the usfm tags (using a UBS 
handbook) and create comments in the pl file about what tags usfm2osis.pl 
supports in some fashion (keep in mind that the script doesn't always support 
every tag well...) and which ones it doesn't. There are some obscure and some 
not-so-obscure tags that it doesn't do anything with. Also it fixes some weird 
tagging that results due to the way verse eid's are produced (though 
imperfectly). I don't believe anyone but me has tested the changes, so any 
suggestions you have would be welcome.


One thing I learned in working with it is that it doesn't handle multiple notes 
in a verse well unless they are on separate lines in the source file. The 
result is often that a note is partially transformed into osis, but 
back-slashes remain from the usfm. If I were you I would open all the source 
files in jEdit and search and replace in all buffers the opening note tags (\f 
and \x, if I remember correctly), adding a line break before each note, making 
sure you don't split a note up, so that the script catches all the notes.

I should add, though, that I have only worked with ltr texts up to this point...

Daniel 


-----Original Message-----
From: "Chris Little" <[EMAIL PROTECTED]>
To: "SWORD Developers' Collaboration Forum" <[email protected]>
Sent: 11/13/08 9:33 AM
Subject: Re: [sword-devel] usfm2osis.pl

Not that I very much desire to open usfm2osis.pl again, but could you 
post an example? I'm having trouble guessing what kind of input is 
resulting in what kind of output.

And what's the encoding of the text? UTF-8, an 8-bit encoding, or 
something else?

--Chris


Peter von Kaehne wrote:
> Thanks to Chris who rewrote usfm2osis a while back it works a lot better
> with utf8 texts.
> 
> A permanent problem I have though with rtol texts is the treatment of
> foot notes:
> 
> As a result of producing essentially a bidi text with ltr tags and rtl
> content inline ltr tags get often messed up. This affects mostly the
> <note> and </note> tag. with order of "note", slash and brackets mixed
> up. As result these require often difficult by hand fixing.
> 
> Is there a way to improve usfm2osis.pl in this matter.
> 
> Thanks!
> 
> Peter
> 
> _______________________________________________
> sword-devel mailing list: [email protected]
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to