I have not understood everything yet ... But I trust you. But if you
have the courage to explain to me I want to learn :)
What I don't understand is how you can find the marker of each verse and
chapter in the utf8 text? What is this marker in question?

Il 15/05/2019 19:03, David Haslam ha scritto:
> Michael’s description matches how I imagined the method
> during my waking moments this morning. :)
>
> David
>
> Sent from ProtonMail Mobile
>
>
> On Wed, May 15, 2019 at 17:33, Michael H <cma...@gmail.com
> <mailto:cma...@gmail.com>> wrote:
>> I've been working long hours and emailing in my break time.  David
>> has the basics of converting to VPL.  
>>
>> I would then make the entire work a column in a spreadsheet. 
>>
>> Then in other collumns insert a list of Book/chapter/verse in order. 
>>
>> The BCV and versetext  columns should align and can be verified, and
>> adjusted where things don't match perfectly, like maybe 3 John has 15
>> instead of 14 verses. 
>>
>> Once the columns align, you can merge them into another column via
>> concatenation operations (&).  This last column becomes your output. 
>>
>> The output needs to consider that section titles and section ranges
>> belong in front of the verse marker. That is a bit more complex
>> search and replace, but can be done successfully. 
>>
>>
>>
>> On Wed, May 15, 2019 at 11:12 AM David Haslam <dfh...@protonmail.com
>> <mailto:dfh...@protonmail.com>> wrote:
>>
>>     The attachment contains a counted list of Myanmar words
>>     containing a font conversion error.
>>     /NB. We need to match these words with what they are in the
>>     legacy font./
>>
>>     This issue should be discussed with the current maintainer of the
>>     SIL *TECkit* converter, whoever that may be.
>>
>>     It may be worthwhile asking our friends at the SIL *Writing
>>     Systems Technology* team. See
>>     https://scripts.sil.org/default
>>
>>     /Aside: My friend Martin Hosken of SIL knew the late Keith
>>     Stribley - the former webmaster of ThanLwinSoft./
>>
>>     Best regards,
>>
>>     David
>>
>>     Sent with ProtonMail <https://protonmail.com> Secure Email.
>>
>>     ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>     On Wednesday, May 15, 2019 4:41 PM, David Haslam
>>     <dfh...@protonmail.com <mailto:dfh...@protonmail.com>> wrote:
>>
>>>     _*Observations*: (continued)_
>>>
>>>     5. The string "*Kd;*" also looks anomalous. It's found only once in 
>>>     ကိုယ်တော်၏ဦးခေါင်းတော်အပေါ်၌ လည်း ဤသူသည်ကား ဂျူးလူမျ Kd;တို့၏ဘုရင်၊
>>>
>>>     6. It's evident from the PDF file that the text is paragraphed
>>>     with indented first lines. See 
>>>     
>>> https://www.dropbox.com/s/do5e675i19xfomf/Screenshot%202019-05-15%2016.29.10.png?dl=0
>>>
>>>     My hunch is that these leading paragraph indents may have been
>>>     coded within contents.xml as the self-closing
>>>     element *<text:tab/>*. There are 372 matches to this.
>>>
>>>     So not only do we need to provide chapter and verse tags (plus
>>>     section headings & parallel passage titles, etc), we also need
>>>     to reconstruct all the paragraph tags.
>>>
>>>     /NB. All structural XML indents were removed by the filter
>>>     "Remove blanks at SOL" in the file /*/contents.pp.tx/*/that
>>>     was output by my simple TextPipe filter. So that's quite a
>>>     different matter./
>>>
>>>     Best regards,
>>>
>>>     David
>>>
>>>     Sent with ProtonMail <https://protonmail.com> Secure Email.
>>>
>>>     ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>     On Wednesday, May 15, 2019 2:22 PM, David Haslam
>>>     <dfh...@protonmail.com <mailto:dfh...@protonmail.com>> wrote:
>>>
>>>>     _*Observations:* (continued*)*_
>>>>
>>>>     4. In addition to the reported instances of the anomalous 3
>>>>     characters (*È,Ø,ò*) found after the font conversion,
>>>>     there are 6 instances of the string "*m;*" that are
>>>>     also probably due to bugs in the converter.
>>>>
>>>>     Best regards,
>>>>
>>>>     David
>>>>
>>>>     Sent with ProtonMail <https://protonmail.com> Secure Email.
>>>>
>>>>     ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>     On Wednesday, May 15, 2019 12:41 PM, David Haslam
>>>>     <dfh...@protonmail.com <mailto:dfh...@protonmail.com>> wrote:
>>>>
>>>>>     Yep - sure - later I can do that. 
>>>>>
>>>>>     David
>>>>>
>>>>>     Sent from ProtonMail Mobile
>>>>>
>>>>>
>>>>>     On Wed, May 15, 2019 at 11:26, Cyrille <lafricai...@gmail.com
>>>>>     <mailto:lafricai...@gmail.com>> wrote:
>>>>>>     David I have no count in box, and I want not to create one.
>>>>>>     Can you push on https://framadrop.org/ it's totally free and
>>>>>>     secure (and private).
>>>>>>     Thank  you.
>>>>>>
>>>>>>
>>>>>>     Il 15/05/2019 11:46, David Haslam ha scritto:
>>>>>>>     Interim progress report.
>>>>>>>
>>>>>>>     I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped 
>>>>>>> the contents to Mat_utf8-odt
>>>>>>>
>>>>>>>     I opened the .odt file using 7-Zip from the Windows Explorer 
>>>>>>> context menu, and extracted the file contents.xml
>>>>>>>
>>>>>>>     I used Notepad++ plug-in XMLTools to pretty print the XML file and 
>>>>>>> saved it as contents.pp.xml
>>>>>>>     This is simply a layout change that's easier to read.
>>>>>>>
>>>>>>>     I viewed the .pp.xml file in BabelPad, which confirmed that the 
>>>>>>> non-XML text was (mostly) Myanmar Unicode.
>>>>>>>
>>>>>>>     I used a TextPipe filter to remove all XML tags, blanks from SOL & 
>>>>>>> EOL and all blank lines.
>>>>>>>     The output file is now contents.pp.txt
>>>>>>>
>>>>>>>     This is now something that's readable content in Myanmar Unicode, 
>>>>>>> with some English text such as "The Gospel according Matthew" near the 
>>>>>>> start.
>>>>>>>
>>>>>>>     The file is best viewed using BabelPad with the option Display 
>>>>>>> Colours | Colour Code by Script.
>>>>>>>     This shows Myanmar characters in light green, and non-Myanmar 
>>>>>>> characters in other colours.
>>>>>>>
>>>>>>>     Observations:
>>>>>>>     1. The font conversion to Unicode left a few scattered characters 
>>>>>>> unconverted. :(
>>>>>>>
>>>>>>>     0000C8      È       18      LATIN CAPITAL LETTER E WITH GRAVE
>>>>>>>     0000D8      Ø       20      LATIN CAPITAL LETTER O WITH STROKE
>>>>>>>     0000F2      ò       3       LATIN SMALL LETTER O WITH GRAVE
>>>>>>>
>>>>>>>     The complete character frequency analysis is attached.
>>>>>>>
>>>>>>>     2. A few verse numbers? are still present here and there.
>>>>>>>     3. The content contains section headings and parallel passage 
>>>>>>> headings as well as verse text.
>>>>>>>
>>>>>>>     I have just uploaded the file contents.pp.zip to a new folder in my 
>>>>>>> Box account and added Cyrille & Michael as viewers.
>>>>>>>
>>>>>>>
>>>>>>>     Best regards,
>>>>>>>
>>>>>>>     David
>>>>>>>
>>>>>>>     Sent with ProtonMail Secure Email.
>>>>>>>
>>>>>>>     ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>>>     On Monday, May 13, 2019 9:19 AM, Cyrille <lafricai...@gmail.com> 
>>>>>>> <mailto:lafricai...@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>>     Hello,
>>>>>>>>     I recently receive a modern translation of Myanmar of the NT, 
>>>>>>>> Psalms and
>>>>>>>>     Proverbs with permission to create a new module.
>>>>>>>>     But the problems are many... Firs to get the text.
>>>>>>>>     I tested different way, but it's done with PageMaker!
>>>>>>>>     I can get the text but the problem is I don't have the verses 
>>>>>>>> number
>>>>>>>>     because they are next in a parallel column and when I copy it I 
>>>>>>>> have
>>>>>>>>     only the biblical text.
>>>>>>>>     I have a pdf also but when I convert it to text (with pdftotext) 
>>>>>>>> the
>>>>>>>>     columns are mixed.
>>>>>>>>     Someone can help me whit any idea?
>>>>>>>>     Next problem is the Unicode... The text is not typed in unicode 
>>>>>>>> but use
>>>>>>>>     a special font.
>>>>>>>>     I can send everything you need or push it the git.crosswire.
>>>>>>>>
>>>>>>>>     Thanks for help.
>>>>>>>>
>>>>>>>>     sword-devel mailing list: sword-devel@crosswire.org 
>>>>>>>> <mailto:sword-devel@crosswire.org>
>>>>>>>>     http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>     Instructions to unsubscribe/change your settings at above page
>>>>>>>>
>>>>>>>
>>>>>>>     _______________________________________________
>>>>>>>     sword-devel mailing list: sword-devel@crosswire.org 
>>>>>>> <mailto:sword-devel@crosswire.org>
>>>>>>>     http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>     Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>>
>>>>
>>>
>>
>>     _______________________________________________
>>     sword-devel mailing list: sword-devel@crosswire.org
>>     <mailto:sword-devel@crosswire.org>
>>     http://www.crosswire.org/mailman/listinfo/sword-devel
>>     Instructions to unsubscribe/change your settings at above page
>>
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to