Cyrille LibreOffice Draw attempts to open the pagemaker file, with limited success. But it confirms that even in the pagemaker source, the verse numbers are a separate text stream. With this source, there is no way to copy the text with verse numbers intact. It appears to be stored with each book in it's own text stream. Each book is a separate text stream in the page maker file. LO Draw isn't rendering all of the pages, only the first 10, So I've only explored Matthew further.
Based on Matthew only, the verses seem to all end with the character "-" or ";/", which should aid in the reconstruction. I've looked through the PDF and this seems to be the case for all books visually as well. However, this isn't perfect: I find 1107 of these characters in Matthew, instead of the expected 1071 verses. But since the text stream has a book introduction, this is likely easily explained. Hopefully this gets you well down the path to creating a stream with verses. I would NOT start from the PDF file, but from the pagemaker file. The PDF almost certainly has a lot of text rearranging and extra characters like page numbers and running heads. Pagemaker has the book text in a single stream, in a form that will convert to unicode relatively easily.
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page