Cyrille writes,
"Do you know which mark?"

I've yet to do the more detailed analysis, but these 3 are initial candidates:

U+1038 း 7,959 MYANMAR SIGN VISARGA
U+104A ၊ 601 MYANMAR SIGN LITTLE SECTION
U+104B ။ 1,489 MYANMAR SIGN SECTION

But as I observed before, where each verse ends requires more than a simple 
"blanket" rule.
cf. There are many more Visarga signs than occur at verse end, just as there 
are many more commas in the KJV than occur likewise.

Observations: (continued)
Still within the scope of contents.pp.txt derived from Mat_utf8.odt

7. I just found an anomalous 'S' that looks like a further font conversion bug.
ဒါဝိဒ်မင်းကြီးတွင် ဥရိယ၏ဇနီးဖြစ်ခဲ့ဖူးသည့်မိန်းမမှ ဖွားမြင်သောသား ဆောလမွန်၊- 
ဆောလမွန်၏သား ရေဟိုးဘိုအမ်၊ ရေဟိုးဘိုအမ်၏သား အာဘီဂျ၊ အာဘီဂျ ၏သား အာဆ၊- အာဆ၏သား 
ဂျေဟိုးရှဖတ်၊ ဂျေဟိုး Sရှဖတ်၏သား ဂျော်ရမ်၊
and also
- သင်တို့သည် အဘယ်ကြောင့် အဝတ်အထည်အဖို့ စိုးရိမ်ကြောင့်ကြနေကြသနည်း။ 
လယ်ကွင်းပြင်ရ dS

Best regards,

David

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, May 15, 2019 7:43 PM, Cyrille <lafricai...@gmail.com> wrote:

> Il 15/05/2019 19:18, David Haslam ha scritto:
>
>> Each of the last 1 or 2 characters of each verse is a regular Myanmar 
>> punctuation mark.
>
> Do you know wich mark?
>
>> We need to be careful how we apply this.  There may well be some exceptions.
>>
>> Windows users should install BabelPad. This free Unicode text editor is 
>> highly recommended.
>>
>> http://www.babelstone.co.uk/Software/BabelPad.html
>>
>> It will help in all sorts of ways, not least in analysis.
>>
>> David
>>
>> Sent from ProtonMail Mobile
>>
>> On Wed, May 15, 2019 at 18:08, Cyrille <lafricai...@gmail.com> wrote:
>>
>>> I have not understood everything yet ... But I trust you. But if you have 
>>> the courage to explain to me I want to learn :)
>>> What I don't understand is how you can find the marker of each verse and 
>>> chapter in the utf8 text? What is this marker in question?
>>>
>>> Il 15/05/2019 19:03, David Haslam ha scritto:
>>>
>>>> Michael’s description matches how I imagined the method during my waking 
>>>> moments this morning. :)
>>>>
>>>> David
>>>>
>>>> Sent from ProtonMail Mobile
>>>>
>>>> On Wed, May 15, 2019 at 17:33, Michael H <cma...@gmail.com> wrote:
>>>>
>>>>> I've been working long hours and emailing in my break time.  David has 
>>>>> the basics of converting to VPL.
>>>>>
>>>>> I would then make the entire work a column in a spreadsheet.
>>>>>
>>>>> Then in other collumns insert a list of Book/chapter/verse in order.
>>>>>
>>>>> The BCV and versetext  columns should align and can be verified, and 
>>>>> adjusted where things don't match perfectly, like maybe 3 John has 15 
>>>>> instead of 14 verses.
>>>>>
>>>>> Once the columns align, you can merge them into another column via 
>>>>> concatenation operations (&).  This last column becomes your output.
>>>>>
>>>>> The output needs to consider that section titles and section ranges 
>>>>> belong in front of the verse marker. That is a bit more complex search 
>>>>> and replace, but can be done successfully.
>>>>>
>>>>> On Wed, May 15, 2019 at 11:12 AM David Haslam <dfh...@protonmail.com> 
>>>>> wrote:
>>>>>
>>>>>> The attachment contains a counted list of Myanmar words containing a 
>>>>>> font conversion error.
>>>>>> NB. We need to match these words with what they are in the legacy font.
>>>>>>
>>>>>> This issue should be discussed with the current maintainer of the SIL 
>>>>>> TECkit converter, whoever that may be.
>>>>>>
>>>>>> It may be worthwhile asking our friends at the SIL Writing Systems 
>>>>>> Technology team. See
>>>>>> https://scripts.sil.org/default
>>>>>>
>>>>>> Aside: My friend Martin Hosken of SIL knew the late Keith Stribley - the 
>>>>>> former webmaster of ThanLwinSoft.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> David
>>>>>>
>>>>>> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>>>>>>
>>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>> On Wednesday, May 15, 2019 4:41 PM, David Haslam <dfh...@protonmail.com> 
>>>>>> wrote:
>>>>>>
>>>>>>> Observations: (continued)
>>>>>>>
>>>>>>> 5. The string "Kd;" also looks anomalous. It's found only once in
>>>>>>> ကိုယ်တော်၏ဦးခေါင်းတော်အပေါ်၌ လည်း ဤသူသည်ကား ဂျူးလူမျ Kd;တို့၏ဘုရင်၊
>>>>>>>
>>>>>>> 6. It's evident from the PDF file that the text is paragraphed with 
>>>>>>> indented first lines. See
>>>>>>> https://www.dropbox.com/s/do5e675i19xfomf/Screenshot%202019-05-15%2016.29.10.png?dl=0
>>>>>>>
>>>>>>> My hunch is that these leading paragraph indents may have been coded 
>>>>>>> within contents.xml as the self-closing element <text:tab/>. There are 
>>>>>>> 372 matches to this.
>>>>>>>
>>>>>>> So not only do we need to provide chapter and verse tags (plus section 
>>>>>>> headings & parallel passage titles, etc), we also need to reconstruct 
>>>>>>> all the paragraph tags.
>>>>>>>
>>>>>>> NB. All structural XML indents were removed by the filter "Remove 
>>>>>>> blanks at SOL" in the file contents.pp.tx that was output by my simple 
>>>>>>> TextPipe filter. So that's quite a different matter.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>>>>>>>
>>>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>>> On Wednesday, May 15, 2019 2:22 PM, David Haslam 
>>>>>>> <dfh...@protonmail.com> wrote:
>>>>>>>
>>>>>>>> Observations: (continued)
>>>>>>>>
>>>>>>>> 4. In addition to the reported instances of the anomalous 3 characters 
>>>>>>>> (È,Ø,ò) found after the font conversion,
>>>>>>>> there are 6 instances of the string "m;" that are also probably due to 
>>>>>>>> bugs in the converter.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>>>>>>>>
>>>>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>>>> On Wednesday, May 15, 2019 12:41 PM, David Haslam 
>>>>>>>> <dfh...@protonmail.com> wrote:
>>>>>>>>
>>>>>>>>> Yep - sure - later I can do that.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> Sent from ProtonMail Mobile
>>>>>>>>>
>>>>>>>>> On Wed, May 15, 2019 at 11:26, Cyrille <lafricai...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> David I have no count in box, and I want not to create one. Can you 
>>>>>>>>>> push on https://framadrop.org/ it's totally free and secure (and 
>>>>>>>>>> private).
>>>>>>>>>> Thank  you.
>>>>>>>>>>
>>>>>>>>>> Il 15/05/2019 11:46, David Haslam ha scritto:
>>>>>>>>>>
>>>>>>>>>>> Interim progress report.
>>>>>>>>>>>
>>>>>>>>>>> I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped 
>>>>>>>>>>> the contents to Mat_utf8-odt
>>>>>>>>>>>
>>>>>>>>>>> I opened the .odt file using 7-Zip from the Windows Explorer 
>>>>>>>>>>> context menu, and extracted the file contents.xml
>>>>>>>>>>>
>>>>>>>>>>> I used Notepad++ plug-in XMLTools to pretty print the XML file and 
>>>>>>>>>>> saved it as contents.pp.xml
>>>>>>>>>>> This is simply a layout change that's easier to read.
>>>>>>>>>>>
>>>>>>>>>>> I viewed the .pp.xml file in BabelPad, which confirmed that the 
>>>>>>>>>>> non-XML text was (mostly) Myanmar Unicode.
>>>>>>>>>>>
>>>>>>>>>>> I used a TextPipe filter to remove all XML tags, blanks from SOL & 
>>>>>>>>>>> EOL and all blank lines.
>>>>>>>>>>> The output file is now contents.pp.txt
>>>>>>>>>>>
>>>>>>>>>>> This is now something that's readable content in Myanmar Unicode, 
>>>>>>>>>>> with some English text such as "The Gospel according Matthew" near 
>>>>>>>>>>> the start.
>>>>>>>>>>>
>>>>>>>>>>> The file is best viewed using BabelPad with the option Display 
>>>>>>>>>>> Colours | Colour Code by Script.
>>>>>>>>>>> This shows Myanmar characters in light green, and non-Myanmar 
>>>>>>>>>>> characters in other colours.
>>>>>>>>>>>
>>>>>>>>>>> Observations:
>>>>>>>>>>> 1. The font conversion to Unicode left a few scattered characters 
>>>>>>>>>>> unconverted. :(
>>>>>>>>>>>
>>>>>>>>>>> 0000C8      È       18      LATIN CAPITAL LETTER E WITH GRAVE
>>>>>>>>>>> 0000D8      Ø       20      LATIN CAPITAL LETTER O WITH STROKE
>>>>>>>>>>> 0000F2      ò       3       LATIN SMALL LETTER O WITH GRAVE
>>>>>>>>>>>
>>>>>>>>>>> The complete character frequency analysis is attached.
>>>>>>>>>>>
>>>>>>>>>>> 2. A few verse numbers? are still present here and there.
>>>>>>>>>>> 3. The content contains section headings and parallel passage 
>>>>>>>>>>> headings as well as verse text.
>>>>>>>>>>>
>>>>>>>>>>> I have just uploaded the file contents.pp.zip to a new folder in my 
>>>>>>>>>>> Box account and added Cyrille & Michael as viewers.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>> Sent with ProtonMail Secure Email.
>>>>>>>>>>>
>>>>>>>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>>>>>>> On Monday, May 13, 2019 9:19 AM, Cyrille
>>>>>>>>>>> [<lafricai...@gmail.com>](mailto:lafricai...@gmail.com)
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>> I recently receive a modern translation of Myanmar of the NT, 
>>>>>>>>>>>> Psalms and
>>>>>>>>>>>> Proverbs with permission to create a new module.
>>>>>>>>>>>> But the problems are many... Firs to get the text.
>>>>>>>>>>>> I tested different way, but it's done with PageMaker!
>>>>>>>>>>>> I can get the text but the problem is I don't have the verses 
>>>>>>>>>>>> number
>>>>>>>>>>>> because they are next in a parallel column and when I copy it I 
>>>>>>>>>>>> have
>>>>>>>>>>>> only the biblical text.
>>>>>>>>>>>> I have a pdf also but when I convert it to text (with pdftotext) 
>>>>>>>>>>>> the
>>>>>>>>>>>> columns are mixed.
>>>>>>>>>>>> Someone can help me whit any idea?
>>>>>>>>>>>> Next problem is the Unicode... The text is not typed in unicode 
>>>>>>>>>>>> but use
>>>>>>>>>>>> a special font.
>>>>>>>>>>>> I can send everything you need or push it the git.crosswire.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for help.
>>>>>>>>>>>>
>>>>>>>>>>>> sword-devel mailing list:
>>>>>>>>>>>> sword-devel@crosswire.org
>>>>>>>>>>>>
>>>>>>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> sword-devel mailing list:
>>>>>>>>>>> sword-devel@crosswire.org
>>>>>>>>>>>
>>>>>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>
>>>>>> _______________________________________________
>>>>>> sword-devel mailing list: sword-devel@crosswire.org
>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list:
>>>> sword-devel@crosswire.org
>>>>
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list:
>> sword-devel@crosswire.org
>>
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
00001   
အကျွန်ုပ်၏အဖကိုသဂြုØဟ်ပါရစ
ေဟု
00001   အကြီးအမÈ;ဖြစ်လိုသောသူသည်
00001   ထိုသချုØင်းဂူတို့သည်
00001   နှစ်နှစ်အောက်ယောက်ျm;
00001   ရည်òန်း၍၊
00001   ကြီးမÈ;
00001   ပရောဖက်တို့၏သချုØင်း
00001   တပ်မÈ;အား
00001   တပ်မÈ;နှင့်
00001   လူတစ်ရာတပ်မÈ;က
00001   လူတစ်ရာတပ်မÈ;တစ်ယောက်သည်
00001   လူတစ်ရာတပ်မÈ;၏အစေခံကို
00001   ယောက်ျm;ဦးရေ
00001   ယောက်ျm;နှင့်
00001   ယောက်ျm;သည်
00001   ယေဇူးသခင်ကိုသဂြုØဟ်ခြင်း
00001   
မိမိပိုင်ဆိုင်သည့်သချုØင်းဂူ
00001   မိမိတို့ကိုရည်òန်း၍
00001   သဂြုØဟ်ရန်အလို့ငှာ
00001   သဂြုØဟ်ကြပါစေဟု
00001   သဂြုØဟ်ပြီးမှ
00001   သဂြုØဟ်ခြင်းငှာ
00001   သားယောက်ျm;ကို
00001   
သားယောက်ျm;ကိုဖွားမြင်လိမ့်မည်။
00001   သချုØင်း
00001   သချုØင်းဂူကို
00001   သချုØင်းဂူတွင်အစ
ောင့်များထားရှိခြင်း
00003   
သချုØင်းဂူများအကြားမှထွက်လာပြီးလျှင်
00001   
သချုØင်းဂူ၀ရှိကျောက်တုံးကို
00002   သချုØင်းတော်ကို
00001   သချုØင်းတော်မှ
00001   သချုØင်းတွင်းအဝ၌
00001   òန်ကြားထား
00001   dS
00001   Kd;တို့၏ဘုရင်၊
00001   Sရှဖတ်၏သား
Code point      Character       Count   Character Name
U+0020          11,545  SPACE
U+0028  (       149     LEFT PARENTHESIS
U+0029  )       149     RIGHT PARENTHESIS
U+002D  -       1,091   HYPHEN-MINUS
U+0031  1       4       DIGIT ONE
U+0032  2       2       DIGIT TWO
U+0036  6       1       DIGIT SIX
U+0038  8       1       DIGIT EIGHT
U+003B  ;       14      SEMICOLON
U+0047  G       1       LATIN CAPITAL LETTER G
U+004B  K       1       LATIN CAPITAL LETTER K
U+004D  M       1       LATIN CAPITAL LETTER M
U+0053  S       2       LATIN CAPITAL LETTER S
U+0054  T       1       LATIN CAPITAL LETTER T
U+0061  a       2       LATIN SMALL LETTER A
U+0063  c       2       LATIN SMALL LETTER C
U+0064  d       3       LATIN SMALL LETTER D
U+0065  e       3       LATIN SMALL LETTER E
U+0067  g       1       LATIN SMALL LETTER G
U+0068  h       2       LATIN SMALL LETTER H
U+0069  i       1       LATIN SMALL LETTER I
U+006C  l       1       LATIN SMALL LETTER L
U+006D  m       6       LATIN SMALL LETTER M
U+006E  n       1       LATIN SMALL LETTER N
U+006F  o       2       LATIN SMALL LETTER O
U+0070  p       1       LATIN SMALL LETTER P
U+0072  r       1       LATIN SMALL LETTER R
U+0073  s       1       LATIN SMALL LETTER S
U+0074  t       2       LATIN SMALL LETTER T
U+0077  w       1       LATIN SMALL LETTER W
U+00C8  È      18      LATIN CAPITAL LETTER E WITH GRAVE
U+00D8  Ø      20      LATIN CAPITAL LETTER O WITH STROKE
U+00F2  ò      3       LATIN SMALL LETTER O WITH GRAVE
U+1000  က     7,640   MYANMAR LETTER KA
U+1001  ခ     2,396   MYANMAR LETTER KHA
U+1002  ဂ     265     MYANMAR LETTER GA
U+1004  င     6,256   MYANMAR LETTER NGA
U+1005  စ     2,392   MYANMAR LETTER CA
U+1006  ဆ     1,020   MYANMAR LETTER CHA
U+1007  ဇ     376     MYANMAR LETTER JA
U+1008  ဈ     3       MYANMAR LETTER JHA
U+1009  ဉ     154     MYANMAR LETTER NYA
U+100A  ည     3,621   MYANMAR LETTER NNYA
U+100B  ဋ     4       MYANMAR LETTER TTA
U+100C  ဌ     7       MYANMAR LETTER TTHA
U+100D  ဍ     9       MYANMAR LETTER DDA
U+100F  ဏ     79      MYANMAR LETTER NNA
U+1010  တ     5,765   MYANMAR LETTER TA
U+1011  ထ     1,461   MYANMAR LETTER THA
U+1012  ဒ     204     MYANMAR LETTER DA
U+1013  ဓ     43      MYANMAR LETTER DHA
U+1014  န     3,173   MYANMAR LETTER NA
U+1015  ပ     2,987   MYANMAR LETTER PA
U+1016  ဖ     974     MYANMAR LETTER PHA
U+1017  ဗ     38      MYANMAR LETTER BA
U+1018  ဘ     458     MYANMAR LETTER BHA
U+1019  မ     5,731   MYANMAR LETTER MA
U+101A  ယ     1,455   MYANMAR LETTER YA
U+101B  ရ     2,536   MYANMAR LETTER RA
U+101C  လ     3,514   MYANMAR LETTER LA
U+101D  ဝ     375     MYANMAR LETTER WA
U+101E  သ     7,122   MYANMAR LETTER SA
U+101F  ဟ     777     MYANMAR LETTER HA
U+1020  ဠ     1       MYANMAR LETTER LLA
U+1021  အ     3,239   MYANMAR LETTER A
U+1024  ဤ     215     MYANMAR LETTER II
U+1025  ဥ     81      MYANMAR LETTER U
U+1026  ဦ     198     MYANMAR LETTER UU
U+1027  ဧ     42      MYANMAR LETTER E
U+1029  ဩ     12      MYANMAR LETTER O
U+102B  ါ     1,453   MYANMAR VOWEL SIGN TALL AA
U+102C  ာ     9,440   MYANMAR VOWEL SIGN AA
U+102D  ိ     8,154   MYANMAR VOWEL SIGN I
U+102E  ီ     876     MYANMAR VOWEL SIGN II
U+102F  ု     8,430   MYANMAR VOWEL SIGN U
U+1030  ူ     2,760   MYANMAR VOWEL SIGN UU
U+1031  ေ     7,541   MYANMAR VOWEL SIGN E
U+1032  ဲ     589     MYANMAR VOWEL SIGN AI
U+1036  ံ     1,129   MYANMAR SIGN ANUSVARA
U+1037  ့     5,309   MYANMAR SIGN DOT BELOW
U+1038  း     7,959   MYANMAR SIGN VISARGA
U+1039  ္     293     MYANMAR SIGN VIRAMA
U+103A  ်     18,107  MYANMAR SIGN ASAT
U+103B  ျ     2,344   MYANMAR CONSONANT SIGN MEDIAL YA
U+103C  ြ     4,347   MYANMAR CONSONANT SIGN MEDIAL RA
U+103D  ွ     1,762   MYANMAR CONSONANT SIGN MEDIAL WA
U+103E  ှ     2,546   MYANMAR CONSONANT SIGN MEDIAL HA
U+1040  ၀     90      MYANMAR DIGIT ZERO
U+1041  ၁     359     MYANMAR DIGIT ONE
U+1042  ၂     242     MYANMAR DIGIT TWO
U+1043  ၃     187     MYANMAR DIGIT THREE
U+1044  ၄     137     MYANMAR DIGIT FOUR
U+1045  ၅     89      MYANMAR DIGIT FIVE
U+1046  ၆     81      MYANMAR DIGIT SIX
U+1047  ၇     61      MYANMAR DIGIT SEVEN
U+1048  ၈     67      MYANMAR DIGIT EIGHT
U+1049  ၉     72      MYANMAR DIGIT NINE
U+104A  ၊     601     MYANMAR SIGN LITTLE SECTION
U+104B  ။     1,489   MYANMAR SIGN SECTION
U+104C  ၌     379     MYANMAR SYMBOL LOCATIVE
U+104D  ၍     564     MYANMAR SYMBOL COMPLETED
U+104E  ၎     54      MYANMAR SYMBOL AFOREMENTIONED
U+104F  ၏     1,699   MYANMAR SYMBOL GENITIVE
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to