As HTML files can be opened using Microsoft Word, my initial step is to save
the file as RTF type.
I then use WordPad to open and resave the RTF file. This reduces size and
clutter.
At this stage, one needs to determine if any of the text styles are
semantically significant. e.g. Are italics used for added words? And has
anything of importance already been squashed?
The key understanding is that RTF files can be processed by scripts or filters.
You can soon learn what are the useful tags.
Assuming something’s been done to mark such words with some non-RTF tags such
that the next step no longer loses the markup, that step is to open with
WordPad and save as Unicode text (which gives UCS-2 aka UTF-16 LE).
Open the text file with (e.g.) Notepad++ and change the encoding to UTF-8, and
resave.
Now the rest of the scripting can be done on the plain text.
I’ve found success with this mixed general purpose approach for several
projects.
[The first step can be done using LibreOffice, if that’s what you prefer. ]
Best regards,
David
Sent from ProtonMail Mobile
On Fri, Feb 1, 2019 at 22:07, Dudeck, John <john.dud...@sim.org> wrote:
> I might just say from my recent experience, creating OSIS from other sources
> is not a trivial matter.
>
> Depending on whether you are creating a Bible, a Commentary, or a GenBook,
> the process is not the same.
>
> It took me two years to develop Perl scripts that convert from Logos XML to
> OSIS for Bibles, Commentaries, GenBooks, and Dictionaries.
>
> For example, even though Logos XML is well-structured, my converter for
> Bibles is customized to the three Bible texts that it converted, and to use
> it for other Bibles will require further customization for each. For
> Commentaries and GenBooks it handles them in a more generic way without need
> for further customization.
>
> OSIS is mainly a semantic markup scheme, highly adapted to Scripture, but
> little else. Since html is a totally flexible structure, you need a way to
> map the structural elements in your source to structural elements in OSIS. It
> has very limited formatting capabilities. You need to have a way to deal with
> CSS. Rendering is mostly left up to the Client User Interface.
>
> I wish I had an html to OSIS converter to offer you, but maybe somebody else
> has come up with a method that is straight-forward.
>
> John
>
>> Hello,
>> All is in the title, someone have a Linux tool to convert html files to
>> osis?
>> In this case it is for the KD module. I download the html source files
>> but I want not to work a lot on it. First I will work on bible issues
>> and not commentary. But if someone have a tool to do quickly the job...
>
> John Dudeck
> Programmer at Editions Cle Lyon, France
> john.dud...@sim.org j...@editionscle.com
> --
> "All programmers are optimists." -- Frederick Brooks
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page