The ThanLwinSoft software was indeed developed by Keith Stribley (1976-2011).
Screenshot posted to my Facebook timeline.
https://m.facebook.com/story.php?story_fbid=10213794210749822&id=1243443528
We had exchanged emails during the year before he died.
Best regards
David
Sent from ProtonMail Mobile
On Tue, May 14, 2019 at 14:08, Cyrille <lafricai...@gmail.com> wrote:
> I send my message again because it was bigger.
>
> The conversion to UTF-8 is 99% solved!! I used a online converter:
> https://thanlwinsoft.github.io/www.thanlwinsoft.org/ThanLwinSoft/MyanmarUnicode/Conversion/myanmarConverter.html
> or:
> http://burglish.my-mm.org/latest/trunk/web/fontconv.htm
>
> See the result
> [here](https://framadrop.org/r/jKnYnvuQIH#mE+FWcvzD1N/Omnfr7uWMZmI/HZUUVPdvnVVkBFyFrA=).
>
> Now the only problem is how to get the verse and chapter number...
>
> Il 14/05/2019 13:53, Michael H ha scritto:
>
>> Cyrille, (Peter),
>>
>> Maybe further discussion on this belongs in Gitlab as issues. Can I get
>> added to this project?
>>
>> Here are the first few lines of Matthew copied from the PDF:
>> ------
>>
>> &Sifrmaw;OD; {0Ha*vdusrf;
>> The Gospel According to Matthew
>> ed'gef;
>> usr;f ûyy*k Kd¾v f &iS rf maw;O;D \b0rwS wf r;f
>> usr;f ûyy*k Kd¾v f &iS rf maw;O;Don f *gavav;,e,rf S*sL;vrl sK;d tmvaf z;O;D
>> \om;jzp\f / (rmu k2;14)
>> olonf tcGefcHoltjzpf trIxrf;chJonf/ (vk 5;27)
>> a,Zl;ocif\aemufvdkufwynfhrjzpfrD ol\trnfrSm
>> av0djzp\f / ool n f wad b;&,d tidk tf e;DwGi f a,Z;lociEf iS ahf wG U Ny;D
>> -----
>> And here are the first few lines of Matthew copied from the Pagemaker file:
>> -----
>> Sifrmaw;OD; {0Ha*vdusrf;
>> The Gospel According to Matthew
>> ed'gef;
>> usrf;�yyk*�dKvf &Sifrmaw;OD;\b0rSwfwrf;
>> usrf;�yyk*�dKvf &Sifrmaw;OD;onf *gavav;,e,frS *sL;vlrsKd;
>> tmvfaz;OD;\om;jzpf\/ (rmuk 2;14) olonf tcGefcHoltjzpf trIxrf;chJonf/ (vk
>> 5;27) a,Zl;ocif\aemufvdkufwynfhrjzpfrD ol\trnfrSm av0djzpf\/ olonf
>> wdab;&d,tkdifteD;wGif a,Zl;ocifESifhawGU NyD;
>>
>> You can see that some letters have changed, and some others are in a
>> different order.
>>
>> The letters that change are likely those points that aren't compatible with
>> unicode, and pagemaker reassigned them to ensure that the file is more
>> widely viewable. Since a conversion is already planned, these won't matter
>> as much, but the font embedded in the PDF is different than the font
>> attached to the pagemaker file, If you do start from the PDF, you'll need
>> to extract the font to get the code points.
>>
>> The problem is that the PDF export from pagemaker sorts the letters into the
>> order they appear on the page. Burmese text has Indian style ligatures,
>> where vowels tend to jump over or under the previous letters, sometimes back
>> 2 or three letters. If you study the following snippets from the beginning
>> of Matthew, you can see there is a difference in order, as well as some
>> glyphs are modified.
>>
>> So, from the PDF letters are out of order, but from Pagemaker, letters are
>> encoded into control points. Fixing the control points is easy and happens
>> with the unicode conversion. Fixing the letter order is not easy. You'll
>> need a first language speaker and plenty of time.
>>
>> The guidance I received on another group was to use either LO Draw or
>> Indesign to export the text from Pagemaker. I'll look into LO Draw again,
>> but I don't have access to an older version of Indesign (the pagemaker
>> import was removed in CS6).
>>
>> On Mon, May 13, 2019 at 10:40 AM Michael H <cma...@gmail.com> wrote:
>>
>>> I unzipped the pagemaker file, and when I open NT_Proverb/Pagemaker
>>> (10.1mb), with a Hex editor, I can 'find' all of the book names, and see
>>> the text there.
>>>
>>> To see the raw text: rename NT_Proverb.pmd > NT_Proverb.zip and open it
>>> with a zip archive progeram. The text is in the Pagemaker file at the top
>>> level of the archive, but encoded with a lot of extraneous information.
>>> (The English text "Matthew" appears at hex location 7A76972).
>>>
>>> When I open the fonts with fontforge, Fontforge suggests the fonts are
>>> encoded as unicode (but the glyphs are obviously not in the right spot.)
>>> However when I copy the text (I copied from LO Draw) and paste it into
>>> jedit and save that as unicode: Reopening the file has a warning 'not
>>> unicode, text may be missing'.
>>>
>>> So, what this means is that there are some glyphs encoded into locations
>>> that unicode treats as control or non-printing codes. The text needs to be
>>> dealt with as a specific encoding that matches whatever the original font
>>> actually uses. I haven't figured out what the original text files were
>>> encoded with. Without that knowledge, I'm not sure my system clipboard or
>>> editor (jedit) will properly respect the glyphs in unusual locations until
>>> the conversion to unicode, and I don't trust myself to be able to detect if
>>> it is or is not properly converted.
>>>
>>> On Mon, May 13, 2019 at 10:11 AM Cyrille <lafricai...@gmail.com> wrote:
>>>
>>>> David,
>>>> Probably you are right about
>>>> [TECkit](http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=TECkit),
>>>> if we get the text it will help us to convert in UNICODE.
>>>> About how to get the text, your method is out of my skills :)
>>>> I you succeed please let me know.
>>>>
>>>> Il 13/05/2019 16:21, David Haslam ha scritto:
>>>>
>>>>> Given the insights from Michael Hart, it may be feasible to temporarily
>>>>> rearrange the main text stream as follows :
>>>>>
>>>>> 1. Replace every EOL by a horizontal tab.
>>>>> 2. Insert an EOL after each verse end character.
>>>>>
>>>>> Observe that the above two steps are wholly reversible such that the
>>>>> original text stream can be restored later.
>>>>>
>>>>> In effect the text stream is now in verse per line (VPL) layout, albeit
>>>>> without verse tags. Some adjustments may be necessary if there any
>>>>> section headings, etc.
>>>>>
>>>>> 3. Add line numbers with the first number being reset to 1 at the start
>>>>> of each chapter, numbers incrementing by 1 for each line.
>>>>> 4. Add a left margin USFM verse tag \v_
>>>>>
>>>>> Steps 3&4 can be implemented in various ways. For my part, I’d use a
>>>>> bespoke TextPipe filter.
>>>>>
>>>>> Another method to consider might be to use Excel formulae. I recall
>>>>> resorting to such a method in the early days of Go Bible.
>>>>>
>>>>> Now restore the original layout by reverting steps 2 & 1, if this is
>>>>> really necessary. That is, if the original text layout appeared to be
>>>>> paragraphed.
>>>>>
>>>>> 5. Decide how & where to insert paragraph tags.
>>>>>
>>>>> 6. Add chapter tags, book ID and main title tags, etc.
>>>>>
>>>>> Hope this gives some useful suggestions that point towards a practical
>>>>> solution.
>>>>>
>>>>> Best regards
>>>>>
>>>>> David
>>>>>
>>>>> Sent from ProtonMail Mobile
>>>>>
>>>>> On Mon, May 13, 2019 at 14:57, Michael H <cma...@gmail.com> wrote:
>>>>>
>>>>>> Cyrille
>>>>>>
>>>>>> LibreOffice Draw attempts to open the pagemaker file, with limited
>>>>>> success. But it confirms that even in the pagemaker source, the verse
>>>>>> numbers are a separate text stream. With this source, there is no way to
>>>>>> copy the text with verse numbers intact. It appears to be stored with
>>>>>> each book in it's own text stream. Each book is a separate text stream
>>>>>> in the page maker file. LO Draw isn't rendering all of the pages, only
>>>>>> the first 10, So I've only explored Matthew further.
>>>>>>
>>>>>> Based on Matthew only, the verses seem to all end with the character "-"
>>>>>> or ";/", which should aid in the reconstruction. I've looked through the
>>>>>> PDF and this seems to be the case for all books visually as well.
>>>>>> However, this isn't perfect: I find 1107 of these characters in Matthew,
>>>>>> instead of the expected 1071 verses. But since the text stream has a
>>>>>> book introduction, this is likely easily explained. Hopefully this gets
>>>>>> you well down the path to creating a stream with verses.
>>>>>>
>>>>>> I would NOT start from the PDF file, but from the pagemaker file. The
>>>>>> PDF almost certainly has a lot of text rearranging and extra characters
>>>>>> like page numbers and running heads. Pagemaker has the book text in a
>>>>>> single stream, in a form that will convert to unicode relatively easily.
>>>>>
>>>>> _______________________________________________
>>>>> sword-devel mailing list:
>>>>> sword-devel@crosswire.org
>>>>>
>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel@crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list:
>> sword-devel@crosswire.org
>>
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page