Cyrille writes,
"Do you know which mark?"
I've yet to do the more detailed analysis, but these 3 are initial candidates:
U+1038 း 7,959 MYANMAR SIGN VISARGA
U+104A ၊ 601 MYANMAR SIGN LITTLE SECTION
U+104B ။ 1,489 MYANMAR SIGN SECTION
But as I observed before, where each verse ends requires more than a simple
"blanket" rule.
cf. There are many more Visarga signs than occur at verse end, just as there
are many more commas in the KJV than occur likewise.
Observations: (continued)
Still within the scope of contents.pp.txt derived from Mat_utf8.odt
7. I just found an anomalous 'S' that looks like a further font conversion bug.
ဒါဝိဒ်မင်းကြီးတွင် ဥရိယ၏ဇနီးဖြစ်ခဲ့ဖူးသည့်မိန်းမမှ ဖွားမြင်သောသား ဆောလမွန်၊-
ဆောလမွန်၏သား ရေဟိုးဘိုအမ်၊ ရေဟိုးဘိုအမ်၏သား အာဘီဂျ၊ အာဘီဂျ ၏သား အာဆ၊- အာဆ၏သား
ဂျေဟိုးရှဖတ်၊ ဂျေဟိုး Sရှဖတ်၏သား ဂျော်ရမ်၊
and also
- သင်တို့သည် အဘယ်ကြောင့် အဝတ်အထည်အဖို့ စိုးရိမ်ကြောင့်ကြနေကြသနည်း။
လယ်ကွင်းပြင်ရ dS
Best regards,
David
Sent with [ProtonMail](https://protonmail.com) Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, May 15, 2019 7:43 PM, Cyrille <lafricai...@gmail.com> wrote:
> Il 15/05/2019 19:18, David Haslam ha scritto:
>
>> Each of the last 1 or 2 characters of each verse is a regular Myanmar
>> punctuation mark.
>
> Do you know wich mark?
>
>> We need to be careful how we apply this. There may well be some exceptions.
>>
>> Windows users should install BabelPad. This free Unicode text editor is
>> highly recommended.
>>
>> http://www.babelstone.co.uk/Software/BabelPad.html
>>
>> It will help in all sorts of ways, not least in analysis.
>>
>> David
>>
>> Sent from ProtonMail Mobile
>>
>> On Wed, May 15, 2019 at 18:08, Cyrille <lafricai...@gmail.com> wrote:
>>
>>> I have not understood everything yet ... But I trust you. But if you have
>>> the courage to explain to me I want to learn :)
>>> What I don't understand is how you can find the marker of each verse and
>>> chapter in the utf8 text? What is this marker in question?
>>>
>>> Il 15/05/2019 19:03, David Haslam ha scritto:
>>>
>>>> Michael’s description matches how I imagined the method during my waking
>>>> moments this morning. :)
>>>>
>>>> David
>>>>
>>>> Sent from ProtonMail Mobile
>>>>
>>>> On Wed, May 15, 2019 at 17:33, Michael H <cma...@gmail.com> wrote:
>>>>
>>>>> I've been working long hours and emailing in my break time. David has
>>>>> the basics of converting to VPL.
>>>>>
>>>>> I would then make the entire work a column in a spreadsheet.
>>>>>
>>>>> Then in other collumns insert a list of Book/chapter/verse in order.
>>>>>
>>>>> The BCV and versetext columns should align and can be verified, and
>>>>> adjusted where things don't match perfectly, like maybe 3 John has 15
>>>>> instead of 14 verses.
>>>>>
>>>>> Once the columns align, you can merge them into another column via
>>>>> concatenation operations (&). This last column becomes your output.
>>>>>
>>>>> The output needs to consider that section titles and section ranges
>>>>> belong in front of the verse marker. That is a bit more complex search
>>>>> and replace, but can be done successfully.
>>>>>
>>>>> On Wed, May 15, 2019 at 11:12 AM David Haslam <dfh...@protonmail.com>
>>>>> wrote:
>>>>>
>>>>>> The attachment contains a counted list of Myanmar words containing a
>>>>>> font conversion error.
>>>>>> NB. We need to match these words with what they are in the legacy font.
>>>>>>
>>>>>> This issue should be discussed with the current maintainer of the SIL
>>>>>> TECkit converter, whoever that may be.
>>>>>>
>>>>>> It may be worthwhile asking our friends at the SIL Writing Systems
>>>>>> Technology team. See
>>>>>> https://scripts.sil.org/default
>>>>>>
>>>>>> Aside: My friend Martin Hosken of SIL knew the late Keith Stribley - the
>>>>>> former webmaster of ThanLwinSoft.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> David
>>>>>>
>>>>>> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>>>>>>
>>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>> On Wednesday, May 15, 2019 4:41 PM, David Haslam <dfh...@protonmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Observations: (continued)
>>>>>>>
>>>>>>> 5. The string "Kd;" also looks anomalous. It's found only once in
>>>>>>> ကိုယ်တော်၏ဦးခေါင်းတော်အပေါ်၌ လည်း ဤသူသည်ကား ဂျူးလူမျ Kd;တို့၏ဘုရင်၊
>>>>>>>
>>>>>>> 6. It's evident from the PDF file that the text is paragraphed with
>>>>>>> indented first lines. See
>>>>>>> https://www.dropbox.com/s/do5e675i19xfomf/Screenshot%202019-05-15%2016.29.10.png?dl=0
>>>>>>>
>>>>>>> My hunch is that these leading paragraph indents may have been coded
>>>>>>> within contents.xml as the self-closing element <text:tab/>. There are
>>>>>>> 372 matches to this.
>>>>>>>
>>>>>>> So not only do we need to provide chapter and verse tags (plus section
>>>>>>> headings & parallel passage titles, etc), we also need to reconstruct
>>>>>>> all the paragraph tags.
>>>>>>>
>>>>>>> NB. All structural XML indents were removed by the filter "Remove
>>>>>>> blanks at SOL" in the file contents.pp.tx that was output by my simple
>>>>>>> TextPipe filter. So that's quite a different matter.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>>>>>>>
>>>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>>> On Wednesday, May 15, 2019 2:22 PM, David Haslam
>>>>>>> <dfh...@protonmail.com> wrote:
>>>>>>>
>>>>>>>> Observations: (continued)
>>>>>>>>
>>>>>>>> 4. In addition to the reported instances of the anomalous 3 characters
>>>>>>>> (È,Ø,ò) found after the font conversion,
>>>>>>>> there are 6 instances of the string "m;" that are also probably due to
>>>>>>>> bugs in the converter.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>>>>>>>>
>>>>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>>>> On Wednesday, May 15, 2019 12:41 PM, David Haslam
>>>>>>>> <dfh...@protonmail.com> wrote:
>>>>>>>>
>>>>>>>>> Yep - sure - later I can do that.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> Sent from ProtonMail Mobile
>>>>>>>>>
>>>>>>>>> On Wed, May 15, 2019 at 11:26, Cyrille <lafricai...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> David I have no count in box, and I want not to create one. Can you
>>>>>>>>>> push on https://framadrop.org/ it's totally free and secure (and
>>>>>>>>>> private).
>>>>>>>>>> Thank you.
>>>>>>>>>>
>>>>>>>>>> Il 15/05/2019 11:46, David Haslam ha scritto:
>>>>>>>>>>
>>>>>>>>>>> Interim progress report.
>>>>>>>>>>>
>>>>>>>>>>> I downloaded the file Mat_utf8.zip from Cyrille's link and unzipped
>>>>>>>>>>> the contents to Mat_utf8-odt
>>>>>>>>>>>
>>>>>>>>>>> I opened the .odt file using 7-Zip from the Windows Explorer
>>>>>>>>>>> context menu, and extracted the file contents.xml
>>>>>>>>>>>
>>>>>>>>>>> I used Notepad++ plug-in XMLTools to pretty print the XML file and
>>>>>>>>>>> saved it as contents.pp.xml
>>>>>>>>>>> This is simply a layout change that's easier to read.
>>>>>>>>>>>
>>>>>>>>>>> I viewed the .pp.xml file in BabelPad, which confirmed that the
>>>>>>>>>>> non-XML text was (mostly) Myanmar Unicode.
>>>>>>>>>>>
>>>>>>>>>>> I used a TextPipe filter to remove all XML tags, blanks from SOL &
>>>>>>>>>>> EOL and all blank lines.
>>>>>>>>>>> The output file is now contents.pp.txt
>>>>>>>>>>>
>>>>>>>>>>> This is now something that's readable content in Myanmar Unicode,
>>>>>>>>>>> with some English text such as "The Gospel according Matthew" near
>>>>>>>>>>> the start.
>>>>>>>>>>>
>>>>>>>>>>> The file is best viewed using BabelPad with the option Display
>>>>>>>>>>> Colours | Colour Code by Script.
>>>>>>>>>>> This shows Myanmar characters in light green, and non-Myanmar
>>>>>>>>>>> characters in other colours.
>>>>>>>>>>>
>>>>>>>>>>> Observations:
>>>>>>>>>>> 1. The font conversion to Unicode left a few scattered characters
>>>>>>>>>>> unconverted. :(
>>>>>>>>>>>
>>>>>>>>>>> 0000C8 È 18 LATIN CAPITAL LETTER E WITH GRAVE
>>>>>>>>>>> 0000D8 Ø 20 LATIN CAPITAL LETTER O WITH STROKE
>>>>>>>>>>> 0000F2 ò 3 LATIN SMALL LETTER O WITH GRAVE
>>>>>>>>>>>
>>>>>>>>>>> The complete character frequency analysis is attached.
>>>>>>>>>>>
>>>>>>>>>>> 2. A few verse numbers? are still present here and there.
>>>>>>>>>>> 3. The content contains section headings and parallel passage
>>>>>>>>>>> headings as well as verse text.
>>>>>>>>>>>
>>>>>>>>>>> I have just uploaded the file contents.pp.zip to a new folder in my
>>>>>>>>>>> Box account and added Cyrille & Michael as viewers.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>> Sent with ProtonMail Secure Email.
>>>>>>>>>>>
>>>>>>>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>>>>>>> On Monday, May 13, 2019 9:19 AM, Cyrille
>>>>>>>>>>> [<lafricai...@gmail.com>](mailto:lafricai...@gmail.com)
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>> I recently receive a modern translation of Myanmar of the NT,
>>>>>>>>>>>> Psalms and
>>>>>>>>>>>> Proverbs with permission to create a new module.
>>>>>>>>>>>> But the problems are many... Firs to get the text.
>>>>>>>>>>>> I tested different way, but it's done with PageMaker!
>>>>>>>>>>>> I can get the text but the problem is I don't have the verses
>>>>>>>>>>>> number
>>>>>>>>>>>> because they are next in a parallel column and when I copy it I
>>>>>>>>>>>> have
>>>>>>>>>>>> only the biblical text.
>>>>>>>>>>>> I have a pdf also but when I convert it to text (with pdftotext)
>>>>>>>>>>>> the
>>>>>>>>>>>> columns are mixed.
>>>>>>>>>>>> Someone can help me whit any idea?
>>>>>>>>>>>> Next problem is the Unicode... The text is not typed in unicode
>>>>>>>>>>>> but use
>>>>>>>>>>>> a special font.
>>>>>>>>>>>> I can send everything you need or push it the git.crosswire.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for help.
>>>>>>>>>>>>
>>>>>>>>>>>> sword-devel mailing list:
>>>>>>>>>>>> sword-devel@crosswire.org
>>>>>>>>>>>>
>>>>>>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> sword-devel mailing list:
>>>>>>>>>>> sword-devel@crosswire.org
>>>>>>>>>>>
>>>>>>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>
>>>>>> _______________________________________________
>>>>>> sword-devel mailing list: sword-devel@crosswire.org
>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list:
>>>> sword-devel@crosswire.org
>>>>
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list:
>> sword-devel@crosswire.org
>>
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
00001
á¡áá»á½ááºá¯ááºáá¡áááá¯ááá¼á¯Ãááºáá«áá
á±áá¯
00001 á¡áá¼á®á¸á¡áÃ;áá¼á
áºááá¯áá±á¬áá°áááº
00001 ááá¯ááá»á¯Ãááºá¸áá°ááá¯á·áááº
00001 áá¾á
áºáá¾á
áºá¡á±á¬ááºáá±á¬ááºá»m;
00001 áááºÃ²ááºá¸áá
00001 áá¼á®á¸áÃ;
00001 ááá±á¬áááºááá¯á·áááá»á¯Ãááºá¸
00001 áááºáÃ;á¡á¬á¸
00001 áááºáÃ;áá¾áá·áº
00001 áá°áá
áºáá¬áááºáÃ;á
00001 áá°áá
áºáá¬áááºáÃ;áá
áºáá±á¬ááºáááº
00001 áá°áá
áºáá¬áááºáÃ;áá¡á
á±áá¶ááá¯
00001 áá±á¬ááºá»m;á¦á¸áá±
00001 áá±á¬ááºá»m;áá¾áá·áº
00001 áá±á¬ááºá»m;áááº
00001 áá±áá°á¸ááááºááá¯ááá¼á¯Ãááºáá¼ááºá¸
00001
ááááááá¯ááºááá¯ááºááá·áºááá»á¯Ãááºá¸áá°
00001 ááááááá¯á·ááá¯áááºÃ²ááºá¸á
00001 ááá¼á¯Ãááºáááºá¡ááá¯á·áá¾á¬
00001 ááá¼á¯Ãááºáá¼áá«á
á±áá¯
00001 ááá¼á¯Ãááºáá¼á®á¸áá¾
00001 ááá¼á¯Ãááºáá¼ááºá¸áá¾á¬
00001 áá¬á¸áá±á¬ááºá»m;ááá¯
00001
áá¬á¸áá±á¬ááºá»m;ááá¯áá½á¬á¸áá¼ááºáááá·áºáááºá
00001 ááá»á¯Ãááºá¸
00001 ááá»á¯Ãááºá¸áá°ááá¯
00001 ááá»á¯Ãááºá¸áá°áá½ááºá¡á
á±á¬áá·áºáá»á¬á¸áá¬á¸áá¾ááá¼ááºá¸
00003
ááá»á¯Ãááºá¸áá°áá»á¬á¸á¡áá¼á¬á¸áá¾áá½ááºáá¬áá¼á®á¸áá»á¾ááº
00001
ááá»á¯Ãááºá¸áá°ááá¾ááá»á±á¬ááºáá¯á¶á¸ááá¯
00002 ááá»á¯Ãááºá¸áá±á¬áºááá¯
00001 ááá»á¯Ãááºá¸áá±á¬áºáá¾
00001 ááá»á¯Ãááºá¸áá½ááºá¸á¡áá
00001 òááºáá¼á¬á¸áá¬á¸
00001 dS
00001 Kd;ááá¯á·ááá¯áááºá
00001 Sáá¾áááºááá¬á¸
Code point Character Count Character Name
U+0020 11,545 SPACE
U+0028 ( 149 LEFT PARENTHESIS
U+0029 ) 149 RIGHT PARENTHESIS
U+002D - 1,091 HYPHEN-MINUS
U+0031 1 4 DIGIT ONE
U+0032 2 2 DIGIT TWO
U+0036 6 1 DIGIT SIX
U+0038 8 1 DIGIT EIGHT
U+003B ; 14 SEMICOLON
U+0047 G 1 LATIN CAPITAL LETTER G
U+004B K 1 LATIN CAPITAL LETTER K
U+004D M 1 LATIN CAPITAL LETTER M
U+0053 S 2 LATIN CAPITAL LETTER S
U+0054 T 1 LATIN CAPITAL LETTER T
U+0061 a 2 LATIN SMALL LETTER A
U+0063 c 2 LATIN SMALL LETTER C
U+0064 d 3 LATIN SMALL LETTER D
U+0065 e 3 LATIN SMALL LETTER E
U+0067 g 1 LATIN SMALL LETTER G
U+0068 h 2 LATIN SMALL LETTER H
U+0069 i 1 LATIN SMALL LETTER I
U+006C l 1 LATIN SMALL LETTER L
U+006D m 6 LATIN SMALL LETTER M
U+006E n 1 LATIN SMALL LETTER N
U+006F o 2 LATIN SMALL LETTER O
U+0070 p 1 LATIN SMALL LETTER P
U+0072 r 1 LATIN SMALL LETTER R
U+0073 s 1 LATIN SMALL LETTER S
U+0074 t 2 LATIN SMALL LETTER T
U+0077 w 1 LATIN SMALL LETTER W
U+00C8 Ã 18 LATIN CAPITAL LETTER E WITH GRAVE
U+00D8 Ã 20 LATIN CAPITAL LETTER O WITH STROKE
U+00F2 ò 3 LATIN SMALL LETTER O WITH GRAVE
U+1000 á 7,640 MYANMAR LETTER KA
U+1001 á 2,396 MYANMAR LETTER KHA
U+1002 á 265 MYANMAR LETTER GA
U+1004 á 6,256 MYANMAR LETTER NGA
U+1005 á
2,392 MYANMAR LETTER CA
U+1006 á 1,020 MYANMAR LETTER CHA
U+1007 á 376 MYANMAR LETTER JA
U+1008 á 3 MYANMAR LETTER JHA
U+1009 á 154 MYANMAR LETTER NYA
U+100A á 3,621 MYANMAR LETTER NNYA
U+100B á 4 MYANMAR LETTER TTA
U+100C á 7 MYANMAR LETTER TTHA
U+100D á 9 MYANMAR LETTER DDA
U+100F á 79 MYANMAR LETTER NNA
U+1010 á 5,765 MYANMAR LETTER TA
U+1011 á 1,461 MYANMAR LETTER THA
U+1012 á 204 MYANMAR LETTER DA
U+1013 á 43 MYANMAR LETTER DHA
U+1014 á 3,173 MYANMAR LETTER NA
U+1015 á 2,987 MYANMAR LETTER PA
U+1016 á 974 MYANMAR LETTER PHA
U+1017 á 38 MYANMAR LETTER BA
U+1018 á 458 MYANMAR LETTER BHA
U+1019 á 5,731 MYANMAR LETTER MA
U+101A á 1,455 MYANMAR LETTER YA
U+101B á 2,536 MYANMAR LETTER RA
U+101C á 3,514 MYANMAR LETTER LA
U+101D á 375 MYANMAR LETTER WA
U+101E á 7,122 MYANMAR LETTER SA
U+101F á 777 MYANMAR LETTER HA
U+1020 á 1 MYANMAR LETTER LLA
U+1021 á¡ 3,239 MYANMAR LETTER A
U+1024 ᤠ215 MYANMAR LETTER II
U+1025 ᥠ81 MYANMAR LETTER U
U+1026 ᦠ198 MYANMAR LETTER UU
U+1027 á§ 42 MYANMAR LETTER E
U+1029 á© 12 MYANMAR LETTER O
U+102B á« 1,453 MYANMAR VOWEL SIGN TALL AA
U+102C ᬠ9,440 MYANMAR VOWEL SIGN AA
U+102D á 8,154 MYANMAR VOWEL SIGN I
U+102E á® 876 MYANMAR VOWEL SIGN II
U+102F ᯠ8,430 MYANMAR VOWEL SIGN U
U+1030 á° 2,760 MYANMAR VOWEL SIGN UU
U+1031 á± 7,541 MYANMAR VOWEL SIGN E
U+1032 á² 589 MYANMAR VOWEL SIGN AI
U+1036 á¶ 1,129 MYANMAR SIGN ANUSVARA
U+1037 á· 5,309 MYANMAR SIGN DOT BELOW
U+1038 Ḡ7,959 MYANMAR SIGN VISARGA
U+1039 á¹ 293 MYANMAR SIGN VIRAMA
U+103A Ạ18,107 MYANMAR SIGN ASAT
U+103B á» 2,344 MYANMAR CONSONANT SIGN MEDIAL YA
U+103C á¼ 4,347 MYANMAR CONSONANT SIGN MEDIAL RA
U+103D á½ 1,762 MYANMAR CONSONANT SIGN MEDIAL WA
U+103E á¾ 2,546 MYANMAR CONSONANT SIGN MEDIAL HA
U+1040 á 90 MYANMAR DIGIT ZERO
U+1041 á 359 MYANMAR DIGIT ONE
U+1042 á 242 MYANMAR DIGIT TWO
U+1043 á 187 MYANMAR DIGIT THREE
U+1044 á 137 MYANMAR DIGIT FOUR
U+1045 á
89 MYANMAR DIGIT FIVE
U+1046 á 81 MYANMAR DIGIT SIX
U+1047 á 61 MYANMAR DIGIT SEVEN
U+1048 á 67 MYANMAR DIGIT EIGHT
U+1049 á 72 MYANMAR DIGIT NINE
U+104A á 601 MYANMAR SIGN LITTLE SECTION
U+104B á 1,489 MYANMAR SIGN SECTION
U+104C á 379 MYANMAR SYMBOL LOCATIVE
U+104D á 564 MYANMAR SYMBOL COMPLETED
U+104E á 54 MYANMAR SYMBOL AFOREMENTIONED
U+104F á 1,699 MYANMAR SYMBOL GENITIVE
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page