David, Probably you are right about TECkit <http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=TECkit>, if we get the text it will help us to convert in UNICODE. About how to get the text, your method is out of my skills :) I you succeed please let me know.
Il 13/05/2019 16:21, David Haslam ha scritto: > Given the insights from Michael Hart, it may be feasible to > temporarily rearrange the main text stream as follows : > > 1. Replace every EOL by a horizontal tab. > 2. Insert an EOL after each verse end character. > > Observe that the above two steps are wholly reversible such that the > original text stream can be restored later. > > In effect the text stream is now in verse per line (VPL) layout, > albeit without verse tags. Some adjustments may be necessary if there > any section headings, etc. > > 3. Add line numbers with the first number being reset to 1 at the > start of each chapter, numbers incrementing by 1 for each line. > 4. Add a left margin USFM verse tag \v_ > > Steps 3&4 can be implemented in various ways. For my part, I’d use a > bespoke TextPipe filter. > > Another method to consider might be to use Excel formulae. I recall > resorting to such a method in the early days of Go Bible. > > Now restore the original layout by reverting steps 2 & 1, if this is > really necessary. That is, if the original text layout appeared to be > paragraphed. > > 5. Decide how & where to insert paragraph tags. > > 6. Add chapter tags, book ID and main title tags, etc. > > Hope this gives some useful suggestions that point towards a practical > solution. > > Best regards > > David > > > Sent from ProtonMail Mobile > > > On Mon, May 13, 2019 at 14:57, Michael H <cma...@gmail.com > <mailto:cma...@gmail.com>> wrote: >> Cyrille >> >> LibreOffice Draw attempts to open the pagemaker file, with limited >> success. But it confirms that even in the pagemaker source, the verse >> numbers are a separate text stream. With this source, there is no way >> to copy the text with verse numbers intact. It appears to be stored >> with each book in it's own text stream. Each book is a separate text >> stream in the page maker file. LO Draw isn't rendering all of the >> pages, only the first 10, So I've only explored Matthew further. >> >> Based on Matthew only, the verses seem to all end with the character >> "-" or ";/", which should aid in the reconstruction. I've looked >> through the PDF and this seems to be the case for all books visually >> as well. However, this isn't perfect: I find 1107 of these characters >> in Matthew, instead of the expected 1071 verses. But since the text >> stream has a book introduction, this is likely easily explained. >> Hopefully this gets you well down the path to creating a stream with >> verses. >> >> I would NOT start from the PDF file, but from the pagemaker file. >> The PDF almost certainly has a lot of text rearranging and extra >> characters like page numbers and running heads. Pagemaker has the >> book text in a single stream, in a form that will convert to unicode >> relatively easily. >> > > > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page